
***README***

* Title: Fast Track to Success? A Mixed Methods Evaluation of Condensed Course 
* Formats at Tennessee Community Colleges
* Authors: Kaylee Matheny, in collaboration with Madison Dell and on behalf
* of the author team
* Date: January 25, 2026

********************************************************************************

Background: This readme offers additional context and explanation for the replication do file and data files for the manuscript noted above. Specifically, the replication data uses deidentified versions of the Tennessee Board of Regents administrative data from all 13 community colleges under TBR's provision in Tennessee. Here, I describe the following: 

i. The Deidentification Process
ii. Specific Definitions and Exclusions
iii. Supplemental Code Available Upon Request

It is also worth noting that at some points in the do file, variable names and commented code refer to "accelerated courses" and at other points "condensed courses." We use these interchangeably. 

********************************************************************************

i. The Deidentification Process

The raw administrative data include student IDs as unique internal student-level identifiers as well as specific course numbers (e.g., ACCT 1010 is Principles of Accounting I [1]) and institution IDs. 

To maintain student anonymity, we generate a unique student-level variable called student_id (based on random_id), and we drop administrative variables banner_id (unique to the student & institution) and random_id (unique to the student). 

To maintain course anonymity, we generate a unique set of fixed effects and course descriptors using the identified data, as follows: 

	* course_section_id = Unique Course Variable 
	(Course Reference Number [CRN], College, & Term)
		* (based on crn, institution, and term code) 

	* course_num_start = Coarse Course Number 
	(first two digits of the four-digit course number; ex: 10 for ACCT 1010) 
		* (based on course number) 

	* course_fe = Unique Course Variable 
	(shared across campuses/terms) 
		* (based on subject code and course number)

	* course_title_id = Unique Course Title Variable
		* (based on course title) 

	* course_crn = Unique Course CRN Variable
		* (based on original CRN; renamed, grouped, and dropped) 

To maintain institution anonymity, we generate a variable grouping institutions.
 
	* institution 
		* (based on the original institution variable, which was
		 renamed, randomly sorted, and then dropped)

While we first use the identifying data to make these variables, we create anonymized versions using the egen group function in Stata. We then drop potentially identifying course-level variables including CRN, title, number, section, and CIP (Classification of Instructional Programs). This is particularly helpful for assuring anonymity for courses with small enrollments. For a given course in which a student is enrolled, the final deidentified dataset includes the modality of instruction, term taught, subject, the first two digits of the course number (to differentiate 1000-level courses from 2000-level courses for potential robustness analyses), and the two-digit CIP. 

Notes: 
- By using the egen function, we maintain unique course IDs, so transforming course numbers to coarse versions does not create issues with differentiating between unique courses. 
- CRN is unique within an institution and term (but can be repeated in different terms at the same institution). 
- There has been common course numbering for Tennessee's community colleges and universities since 2010, when the Complete College Tennessee Act was passed and written into Tennessee State Law: Tenn. Code Ann. § 49-7-202(r)(3)(A), meaning "courses with common content will carry the same prefix, number, title, credits, description, and competencies" and that credit is easily transferrable between Tennessee’s public higher education institutions. 

********************************************************************************

ii. Specific Definitions and Exclusions

For enrollment-level data: 
**************************

For the enrollment-level data, TBR extracted all unique enrollments from Fall 2015 through Spring 2023. Using these raw data, we remove course sections that did not run and make uniform course numbers (excluding letters from course numbers). We then make the deidentified student IDs and course FEs (see section ii below). The result is a deidentified data file with 4,207,550 enrollments. 

For student covariates: 
***********************

* We make a binary age indicator to identify traditional vs. non-traditional status (rather than using a continuous age variable), since the distribution is concentrated at younger ages and since the primary difference of interest is by whether a student is considered "non-traditional" (and not whether an additional year of age is associated with selection into or performance in condensed courses). 
* We include an indicator for female given that the available options are "male," "female," or "unknown" (rather than a more granular gender breakdown). 
* We include indicators for racial groups with at least 1,000 unique students. 
* We include an indicator for whether a student is a Pell recipient rather than using awarded dollar amounts, as the variation of interest is whether students are identified as high-need or not (and not, for example, whether students select into or perform differently in condensed courses for an additional dollar or additional $1,000 of Pell). 
* When HS GPA is missing (often for non-traditional students), we replace the missing value with the analytic sample median and generate an indicator for missing high school GPA. 
* For student-level data, we use the first non-missing value across a unique student's course enrollments for each of the above variables. For example, if gender is recorded as unknown in their first course enrollment and as female in their second course enrollment, we use "female." 

For student-level data: 
***********************

The student-level dataset includes seven fall cohorts from the years 2015 through 2021; we excluded later cohorts since the student-level outcomes of greatest interest took place over longer time horizons. Cohorts only include students identified as freshmen (student level = 1) and who were enrolled in college for the first time (registration type = 1). "Enrolled in college for the first time" means students were (1) enrolled for the first time in a fall semester; or (2) enrolled for the first time in summer and returned in the fall immediately following. Students enrolled in 12 or more semester hours of degree credits are considered full-time students (though our analysis is not limited to full-time students). 

We only keep students who are also in the enrollment-level analytic data file. A student might be in the enrollment-level analytic data file, and not in the student-level data file, because they are not first-time freshmen in the enrollment level data file; this restriction is straightforward. 

A student might be in the student-level data file, and not in the enrollment-level analytic data, because they are only enrolled in courses we exclude from analysis (e.g., dual enrollment). Additionally, we do not have coursetaking outcomes or covariates for students who are not in the enrollment-level analytic data file. This restriction excludes a small share of students relative to the full student-level data file. 

Finally, we exclude a very small subset of students who may be in both data files but whose first-time freshman semester only included courses we exclude from analysis. For example, a student whose first time freshman semester only included an e-campus course and who later enrolled in a slate of regular courses is only considered a first-time student for that first semester enrolled; because they were not enrolled in any 15 or 7 week courses in the analytic sample in that first semester, they are excluded from analysis. 

********************************************************************************

iii. Supplemental Code Available Upon Request

The current do file for replication includes all analyses described in the paper. However, to narrow down to this set of analyses, we also tried different specifications, including the following: 

	* Selection specifications with fewer fixed effects
	* Fixed effects models with different sets of fixed effects
	* Instrumental variable models with course FEs instead of subject FEs
	* Heterogeneity analyses by part of term (e.g., first vs second fall),
		* term, subject, and course
	* Instrumental variable models at the student level

In addition, there are sections of code that come before the replication do file, which have been omitted because they draw on identifying data in order to construct the analytic data files. While we would be happy to share these do files for external review (with the permission of our TBR partners), we are unable to share the underlying administrative data. 

Sources: 
[1] https://catalog.chattanoogastate.edu/content.php?catoid=38&navoid=8158
